chore: align MCP corruption manifest with evidence index#161
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for MCP trace corruption manifests by updating the evidence index and the generation script. It includes logic to extract fixture families from corruption sources. Feedback was provided to improve the robustness of the regex used for family extraction, which currently fails on baseline fixtures, and to ensure stricter JSON list processing by raising a RuntimeError for invalid types in the 'corruptions' field.
| return {str(fixture["family"]) for fixture in manifest["fixtures"]} | ||
|
|
||
| def _family_from_fixture_slug(slug: str) -> str | None: | ||
| match = re.match(r"^(?P<family>.+)_[^_]+_v\d+$", slug) |
There was a problem hiding this comment.
The regex r"^(?P<family>.+)_[^_]+_v\d+$" is incorrect for fixture slugs that do not include a degradation level (e.g., baseline fixtures like mcp_trace_replay_v1). In such cases, it would incorrectly extract mcp_trace as the family instead of mcp_trace_replay. Additionally, it is fragile if a family name contains multiple underscores. Using a non-greedy match for the family and explicitly handling the optional degradation levels is more robust.
| match = re.match(r"^(?P<family>.+)_[^_]+_v\d+$", slug) | |
| match = re.match(r"^(?P<family>.+?)(?:_(?:mild|moderate|degraded))?_v\d+$", slug) |
| if isinstance(payload.get("corruptions"), list): | ||
| for corruption in payload["corruptions"]: | ||
| if not isinstance(corruption, dict): | ||
| continue | ||
| source_fixture = corruption.get("source_fixture") | ||
| if not isinstance(source_fixture, str): | ||
| continue | ||
| fixture_slug = source_fixture.rsplit("/", 1)[-1] | ||
| family = _family_from_fixture_slug(fixture_slug) | ||
| if family: | ||
| families.add(family) |
There was a problem hiding this comment.
This block violates the general rule regarding JSON list processing. It should treat null as an empty list and raise a RuntimeError for other non-list types to maintain strictness. Additionally, use type guards like isinstance(item, dict) before accessing fields within list items to prevent crashes on malformed data.
| if isinstance(payload.get("corruptions"), list): | |
| for corruption in payload["corruptions"]: | |
| if not isinstance(corruption, dict): | |
| continue | |
| source_fixture = corruption.get("source_fixture") | |
| if not isinstance(source_fixture, str): | |
| continue | |
| fixture_slug = source_fixture.rsplit("/", 1)[-1] | |
| family = _family_from_fixture_slug(fixture_slug) | |
| if family: | |
| families.add(family) | |
| corruptions = payload.get("corruptions") | |
| if corruptions is not None: | |
| if not isinstance(corruptions, list): | |
| raise RuntimeError("Field 'corruptions' must be a list if present") | |
| for corruption in corruptions: | |
| if not isinstance(corruption, dict): | |
| continue | |
| source_fixture = corruption.get("source_fixture") | |
| if not isinstance(source_fixture, str): | |
| continue | |
| fixture_slug = source_fixture.rsplit("/", 1)[-1] | |
| family = _family_from_fixture_slug(fixture_slug) | |
| if family: | |
| families.add(family) |
References
- When processing JSON data, treat null values for expected list fields as empty lists, but raise a RuntimeError for other non-list types to maintain strictness. Use type guards like isinstance(item, dict) before accessing fields within list items to prevent crashes on malformed data.
Motivation
artifacts/mcp_trace_corruption_manifest.jsonmust be either registered in the deterministic evidence index or explicitly justified as out of scope per repository evidence-index discipline.Description
artifacts/mcp_trace_corruption_manifest.jsontoARTIFACT_SPECSinscripts/generate_evidence_index.pywithgeneratorset toscripts/generate_mcp_trace_corruptions.pyandevidence_categoryset tocorruption_manifest._family_from_fixture_slugand enhancing_extract_fixture_familiesto infer families fromcorruptions[*].source_fixtureslugs so the manifest can reportfixture_familiesdeterministically.artifacts/evidence_index.jsonvia the generator (deterministic output) so the new artifact appears in the committed index without hand-editing the generated file.scripts/generate_evidence_index.pyand the regeneratedartifacts/evidence_index.jsonand do not alter schema shape or runtime behavior.Testing
python scripts/generate_evidence_index.pyand it producedartifacts/evidence_index.jsonsuccessfully.pytest -q tests/test_evidence_index.pyand the tests passed (8 passed).npm run checkand full test suite, both completed successfully (pytestreported291 passed).Codex Task